Skip to content

feat: add OpenAI diarization support#651

Open
8times4 wants to merge 2 commits into
TanStack:mainfrom
8times4:feat/openai-transcription-diarization
Open

feat: add OpenAI diarization support#651
8times4 wants to merge 2 commits into
TanStack:mainfrom
8times4:feat/openai-transcription-diarization

Conversation

@8times4
Copy link
Copy Markdown

@8times4 8times4 commented May 27, 2026

🎯 Changes

This change adds diarization support for OpenAI's gpt-4o-transcribe-diarize model, based on https://developers.openai.com/api/docs/guides/speech-to-text?lang=javascript

✅ Checklist

  • I have followed the steps in the Contributing guide.
  • I have tested this code locally with pnpm run test:pr.

🚀 Release Impact

  • This change affects published code, and I have generated a changeset.
  • This change is docs/CI/dev-only (no release).

Summary by CodeRabbit

  • New Features

    • Added OpenAI speaker diarization support (gpt-4o-transcribe-diarize) for multi-speaker audio
    • Added diarized_json response format with speaker-labeled segments
    • Added configurable chunking strategy and diarization-related options
  • Documentation

    • Updated transcription docs, adapter guides, examples, and best practices with diarization usage and constraints
  • Tests

    • Added tests covering diarization requests, parsing/mapping, and validation rules

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 27, 2026

Review Change Stack

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b2c455a0-25d6-4921-8f26-77965d2791be

📥 Commits

Reviewing files that changed from the base of the PR and between 05dfb53 and fbb57a0.

📒 Files selected for processing (6)
  • .changeset/openai-transcription-diarization.md
  • docs/adapters/openai.md
  • docs/comparison/vercel-ai-sdk.md
  • docs/media/generation-hooks.md
  • docs/media/transcription.md
  • docs/reference/interfaces/TranscriptionOptions.md
✅ Files skipped from review due to trivial changes (5)
  • .changeset/openai-transcription-diarization.md
  • docs/media/generation-hooks.md
  • docs/comparison/vercel-ai-sdk.md
  • docs/adapters/openai.md
  • docs/reference/interfaces/TranscriptionOptions.md

📝 Walkthrough

Walkthrough

Adds end-to-end speaker diarization for OpenAI transcription: new gpt-4o-transcribe-diarize handling, diarized_json support across types, adapter logic and validation, tests covering defaults and error cases, and documentation/changeset updates.

Changes

OpenAI Transcription Diarization Feature

Layer / File(s) Summary
Response Format Type Contracts
packages/ai/src/types.ts, packages/ai/src/activities/generateTranscription/index.ts, packages/ai-client/src/generation-types.ts, packages/ai-openai/src/audio/transcription-provider-options.ts, docs/reference/interfaces/TranscriptionOptions.md
responseFormat unions are extended to include 'diarized_json'. OpenAITranscriptionProviderOptions adds optional chunking_strategy supporting 'auto', VAD config, or null.
OpenAI Adapter Diarization Implementation
packages/ai-openai/src/adapters/transcription.ts
Adapter detects diarization-capable models, validates diarization options, maps requests to diarized_json when appropriate, auto-sets chunking_strategy: 'auto' for the diarize model by default, parses diarized segments into TranscriptionSegment[] with speaker/start/end/text, and preserves non-diarized paths.
Diarization Adapter Test Coverage
packages/ai-openai/tests/transcription-adapter.test.ts
Vitest suite verifies default diarization wiring (diarized_json, chunking_strategy: 'auto'), explicit options forwarding (server VAD, known speakers), chunking_strategy: null passthrough, alternative response formats on diarize model, and validation error cases (unsupported options, speaker metadata limits/mismatch).
Documentation and Changeset
.changeset/openai-transcription-diarization.md, docs/media/transcription.md, docs/adapters/openai.md, docs/media/generation-hooks.md, docs/comparison/vercel-ai-sdk.md, docs/reference/interfaces/TranscriptionOptions.md, packages/ai/skills/ai-core/media-generation/SKILL.md
Changeset and docs updated to document gpt-4o-transcribe-diarize, diarized_json format, timestamp_granularities, diarization chunking_strategy guidance, and updated Whisper examples using responseFormat: 'verbose_json'.

Sequence Diagram

sequenceDiagram
  participant Adapter as OpenAI Adapter
  participant Validator as validateDiarizationOptions
  participant Mapper as mapResponseFormat
  participant OpenAI as OpenAI API
  participant Parser as Diarized Parser

  Adapter->>Adapter: Identify diarization-capable model
  Adapter->>Validator: Validate diarization options
  Validator-->>Adapter: Constraints enforced
  Adapter->>Mapper: Map responseFormat
  Mapper-->>Adapter: diarized_json selected or mapped format
  Adapter->>OpenAI: Create transcription request (response_format, chunking_strategy)
  OpenAI-->>Adapter: Diarized or non-diarized response
  Adapter->>Parser: Map segments with speaker labels
  Parser-->>Adapter: TranscriptionSegment[]
  Adapter-->>Adapter: Return structured transcription result
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

  • tombeckenham
  • jherr

Poem

🐰
Voices hop across the line,
Speakers sorted, timestamps fine,
Chunks arranged, each name defined,
JSON brings the chorus timed,
A rabbit cheers: "Diarize!"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title accurately summarizes the main feature added: OpenAI diarization support for the gpt-4o-transcribe-diarize model, which is the primary focus across all changes.
Description check ✅ Passed The PR description follows the template structure, includes a clear explanation of changes with an OpenAI API reference, and all checklist items are properly completed.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/ai-openai/src/adapters/transcription.ts`:
- Around line 267-285: The diarization validation is missing a local guard for
responseFormat: update validateDiarizationOptions (used by transcribe and
guarded by isDiarizeTranscriptionModel) to throw when
modelOptions.responseFormat (or the mapped value from mapResponseFormat) is not
one of the allowed values ["json","text","diarized_json"]; ensure transcribe()
cannot send srt/vtt/verbose_json for diarize models by checking
modelOptions.responseFormat (or resolved response format) early and throwing a
clear error stating diarization models only support json, text, and
diarized_json; reference validateDiarizationOptions, transcribe,
mapResponseFormat, and isDiarizeTranscriptionModel when applying the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7c4b4b31-fb90-4e00-9d8f-1454f513e089

📥 Commits

Reviewing files that changed from the base of the PR and between 5634f18 and a59d368.

📒 Files selected for processing (13)
  • .changeset/openai-transcription-diarization.md
  • docs/adapters/openai.md
  • docs/comparison/vercel-ai-sdk.md
  • docs/media/generation-hooks.md
  • docs/media/transcription.md
  • docs/reference/interfaces/TranscriptionOptions.md
  • packages/ai-client/src/generation-types.ts
  • packages/ai-openai/src/adapters/transcription.ts
  • packages/ai-openai/src/audio/transcription-provider-options.ts
  • packages/ai-openai/tests/transcription-adapter.test.ts
  • packages/ai/skills/ai-core/media-generation/SKILL.md
  • packages/ai/src/activities/generateTranscription/index.ts
  • packages/ai/src/types.ts

Comment thread packages/ai-openai/src/adapters/transcription.ts
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 28, 2026

Actionable comments posted: 0

@AlemTuzlak AlemTuzlak requested a review from tombeckenham June 3, 2026 14:45
@tombeckenham tombeckenham force-pushed the feat/openai-transcription-diarization branch from 05dfb53 to fbb57a0 Compare June 4, 2026 03:47
@nx-cloud
Copy link
Copy Markdown

nx-cloud Bot commented Jun 4, 2026

View your CI Pipeline Execution ↗ for commit fbb57a0

Command Status Duration Result
nx affected --targets=test:sherif,test:knip,tes... ✅ Succeeded 4m 17s View ↗
nx run-many --targets=build --exclude=examples/... ✅ Succeeded 1m 8s View ↗

☁️ Nx Cloud last updated this comment at 2026-06-04 03:56:19 UTC

@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented Jun 4, 2026

Open in StackBlitz

@tanstack/ai

npm i https://pkg.pr.new/@tanstack/ai@651

@tanstack/ai-anthropic

npm i https://pkg.pr.new/@tanstack/ai-anthropic@651

@tanstack/ai-client

npm i https://pkg.pr.new/@tanstack/ai-client@651

@tanstack/ai-code-mode

npm i https://pkg.pr.new/@tanstack/ai-code-mode@651

@tanstack/ai-code-mode-skills

npm i https://pkg.pr.new/@tanstack/ai-code-mode-skills@651

@tanstack/ai-devtools-core

npm i https://pkg.pr.new/@tanstack/ai-devtools-core@651

@tanstack/ai-elevenlabs

npm i https://pkg.pr.new/@tanstack/ai-elevenlabs@651

@tanstack/ai-event-client

npm i https://pkg.pr.new/@tanstack/ai-event-client@651

@tanstack/ai-fal

npm i https://pkg.pr.new/@tanstack/ai-fal@651

@tanstack/ai-gemini

npm i https://pkg.pr.new/@tanstack/ai-gemini@651

@tanstack/ai-grok

npm i https://pkg.pr.new/@tanstack/ai-grok@651

@tanstack/ai-groq

npm i https://pkg.pr.new/@tanstack/ai-groq@651

@tanstack/ai-isolate-cloudflare

npm i https://pkg.pr.new/@tanstack/ai-isolate-cloudflare@651

@tanstack/ai-isolate-node

npm i https://pkg.pr.new/@tanstack/ai-isolate-node@651

@tanstack/ai-isolate-quickjs

npm i https://pkg.pr.new/@tanstack/ai-isolate-quickjs@651

@tanstack/ai-ollama

npm i https://pkg.pr.new/@tanstack/ai-ollama@651

@tanstack/ai-openai

npm i https://pkg.pr.new/@tanstack/ai-openai@651

@tanstack/ai-openrouter

npm i https://pkg.pr.new/@tanstack/ai-openrouter@651

@tanstack/ai-preact

npm i https://pkg.pr.new/@tanstack/ai-preact@651

@tanstack/ai-react

npm i https://pkg.pr.new/@tanstack/ai-react@651

@tanstack/ai-react-ui

npm i https://pkg.pr.new/@tanstack/ai-react-ui@651

@tanstack/ai-solid

npm i https://pkg.pr.new/@tanstack/ai-solid@651

@tanstack/ai-solid-ui

npm i https://pkg.pr.new/@tanstack/ai-solid-ui@651

@tanstack/ai-svelte

npm i https://pkg.pr.new/@tanstack/ai-svelte@651

@tanstack/ai-utils

npm i https://pkg.pr.new/@tanstack/ai-utils@651

@tanstack/ai-vue

npm i https://pkg.pr.new/@tanstack/ai-vue@651

@tanstack/ai-vue-ui

npm i https://pkg.pr.new/@tanstack/ai-vue-ui@651

@tanstack/openai-base

npm i https://pkg.pr.new/@tanstack/openai-base@651

@tanstack/preact-ai-devtools

npm i https://pkg.pr.new/@tanstack/preact-ai-devtools@651

@tanstack/react-ai-devtools

npm i https://pkg.pr.new/@tanstack/react-ai-devtools@651

@tanstack/solid-ai-devtools

npm i https://pkg.pr.new/@tanstack/solid-ai-devtools@651

commit: fbb57a0

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/media/transcription.md`:
- Line 561: The example hardcodes 'whisper-1' in the createOpenaiTranscription
call; update the docs to use the provider's latest transcription model constant
exported from the OpenAI adapter's model-meta.ts instead of a string literal.
Import or reference the exported latest-model symbol from that file (e.g., the
adapter's LATEST_* or DEFAULT_* transcription model constant) and pass that
symbol into createOpenaiTranscription so the docs always use the adapter-defined
current OpenAI transcription model.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b2c455a0-25d6-4921-8f26-77965d2791be

📥 Commits

Reviewing files that changed from the base of the PR and between 05dfb53 and fbb57a0.

📒 Files selected for processing (6)
  • .changeset/openai-transcription-diarization.md
  • docs/adapters/openai.md
  • docs/comparison/vercel-ai-sdk.md
  • docs/media/generation-hooks.md
  • docs/media/transcription.md
  • docs/reference/interfaces/TranscriptionOptions.md
✅ Files skipped from review due to trivial changes (5)
  • .changeset/openai-transcription-diarization.md
  • docs/media/generation-hooks.md
  • docs/comparison/vercel-ai-sdk.md
  • docs/adapters/openai.md
  • docs/reference/interfaces/TranscriptionOptions.md

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Inline review comments failed to post. This is likely due to GitHub's internal server error or limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/media/transcription.md`:
- Line 561: The example hardcodes 'whisper-1' in the createOpenaiTranscription
call; update the docs to use the provider's latest transcription model constant
exported from the OpenAI adapter's model-meta.ts instead of a string literal.
Import or reference the exported latest-model symbol from that file (e.g., the
adapter's LATEST_* or DEFAULT_* transcription model constant) and pass that
symbol into createOpenaiTranscription so the docs always use the adapter-defined
current OpenAI transcription model.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b2c455a0-25d6-4921-8f26-77965d2791be

📥 Commits

Reviewing files that changed from the base of the PR and between 05dfb53 and fbb57a0.

📒 Files selected for processing (6)
  • .changeset/openai-transcription-diarization.md
  • docs/adapters/openai.md
  • docs/comparison/vercel-ai-sdk.md
  • docs/media/generation-hooks.md
  • docs/media/transcription.md
  • docs/reference/interfaces/TranscriptionOptions.md
✅ Files skipped from review due to trivial changes (5)
  • .changeset/openai-transcription-diarization.md
  • docs/media/generation-hooks.md
  • docs/comparison/vercel-ai-sdk.md
  • docs/adapters/openai.md
  • docs/reference/interfaces/TranscriptionOptions.md
🛑 Comments failed to post (1)
docs/media/transcription.md (1)

561-561: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use the provider’s latest OpenAI transcription model in this example.

This changed snippet still hardcodes whisper-1; please update it to the latest OpenAI transcription model defined in the adapter model-meta.ts to keep docs aligned with project policy.

As per coding guidelines: “Use the latest model per provider in documentation example code, sourced from each adapter's model-meta.ts (newest gpt-*, claude-*, gemini-*, …)”.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/media/transcription.md` at line 561, The example hardcodes 'whisper-1'
in the createOpenaiTranscription call; update the docs to use the provider's
latest transcription model constant exported from the OpenAI adapter's
model-meta.ts instead of a string literal. Import or reference the exported
latest-model symbol from that file (e.g., the adapter's LATEST_* or DEFAULT_*
transcription model constant) and pass that symbol into
createOpenaiTranscription so the docs always use the adapter-defined current
OpenAI transcription model.

@tombeckenham
Copy link
Copy Markdown
Contributor

Hi @8times4, thank you for this. Would you be able to create an e2e test for this using aimock? The tests are in the e2e test package. Ideally, adding a way to see the results on one of the ts-react-chat example pages would be great as well

@tombeckenham
Copy link
Copy Markdown
Contributor

Code review

Found 3 issues:

  1. No E2E test coverage added for the diarization feature/behavior change (new gpt-4o-transcribe-diarize model, diarized_json responseFormat, speaker-labeled TranscriptionSegments, chunking_strategy + known_speaker_* options + validation). (CLAUDE.md says "Every feature, bug fix, or behavior change MUST include E2E test coverage." and "Add or update E2E tests — this is mandatory for any feature, bug fix, or behavior change"; see also the new-feature row in the E2E table and Pre-PR Quality Gate requiring pnpm --filter @tanstack/ai-e2e test:e2e. AGENTS.md and prior transcription PRs feat: extract @tanstack/openai-base and @tanstack/ai-utils packages #409/feat(ai-grok): audio, speech, and realtime adapters + example wiring #506 reviews establish the same convention: update feature-support.ts + test-matrix + fixture + spec.)

id: generateId(this.name),
model,
text: response.text,
duration: response.duration,
...(segments.length > 0 && { segments }),
}
}
if (useVerbose) {
const response = (await this.client.audio.transcriptions.create({
...request,

  1. responseFormat union literal duplicated (with added | 'diarized_json') across three locations instead of extracting a shared type. (CLAUDE.md says "Always look for repeated code or if the function you are trying to implement is already in another file" and "Review code at the end to see if you can make it more concise and less repetitive".)

ai/packages/ai/src/types.ts

Lines 1723 to 1732 in 05dfb53

confidence?: number
/** Speaker identifier, if diarization is enabled */
speaker?: string
}
/**
* A single word with timing information.
*/
export interface TranscriptionWord {
/** The transcribed word */

  1. Validation guards in the newly added validateDiarizationOptions (and caller guard) are inconsistent with modelOptions conventions and incomplete: camelCase cast for responseFormat inside modelOptions (while spread + all other fields use snake_case response_format/chunking_strategy/known_speaker_*); prompt rejection and diarization-options guard only inspect top-level (not modelOptions paths); chunking_strategy diarize-only restriction does not check modelOptions?.chunking_strategy. This allows bypasses leading to late 400s instead of early errors. (CLAUDE.md says "Don't create fallback code. It hides problems. Just display errors to the user".)

)
}
}
protected mapResponseFormat(
format?: OpenAITranscriptionResponseFormat,
): OpenAITranscriptionResponseFormat {
if (!format) return 'json'
return format
}
}
/**
* Creates an OpenAI transcription adapter with explicit API key.
* Type resolution happens here at the call site.
*
* @param model - The model name (e.g., 'whisper-1')
* @param apiKey - Your OpenAI API key
* @param config - Optional additional configuration
* @returns Configured OpenAI transcription adapter instance with resolved types
*
* @example
* ```typescript
* const adapter = createOpenaiTranscription('whisper-1', "sk-...");
*
* const result = await generateTranscription({
* adapter,
* audio: audioFile,
* language: 'en'
* });
* ```
*/

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants